Big Tech Stock Prices

Analyzing Big Tech stock prices from 2010 to 2022

Author

Tech Titans

Abstract

The goal of this project is to understand the temporal behavior of Big Tech stocks that are actively traded on the NYSE. This analysis will show trends in the market to understand times when the market was on a downtrend (bear market) or in an uptrend (bull market). This will provide insight into times when investments should be made or when they should be pulled back. Investment analyses such as the aformentioned are extremely valuable to large finanacial institutions that aim to benefit from growth in the market. This is extremely important for those who place their hard earned dollar with these institutions as a means to save for retirement.

The analysis will consist of answering two key questions in a sequential manner: exploratory data analysis (EDA), data wrangling, and data visualization. This process is key to gaining insight from the data and allows for a clean platform that can used to train machine learning models.

# Importing the required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from scipy import stats

# Loading Data
stocks = pd.read_csv('data/big_tech_stock_prices.csv')
companies = pd.read_csv('data/big_tech_companies.csv')

Q1: How do stock prices change over time, looking at the basic information like open, close, high, low etc.**

EDA

# Step 1: Exploratory Data Analysis (EDA)
df = stocks.copy()

# Convert the date column to datetime format
df['date'] = pd.to_datetime(df['date'])

# Summary Statistics
company_stats = df.groupby('stock_symbol').agg({'open': ['mean', 'median', 'min', 'max', 'std'], 'volume': ['mean', 'median', 'min', 'max', 'std']})
print("Summary Statistics:")
print(company_stats)
Summary Statistics:
                    open                                                 \
                    mean      median        min         max         std   
stock_symbol                                                              
AAPL           51.274171   29.745001   6.870357  182.630005   47.316509   
ADBE          186.023938   97.589996  22.969999  696.280029  173.562115   
AMZN           58.937204   36.325001   5.296500  187.199997   54.138607   
CRM           103.419948   76.290001  15.522500  310.000000   71.287894   
CSCO           33.493880   29.500000  13.930000   64.040001   12.643887   
GOOGL          49.149625   38.521000  10.968719  151.250000   35.809816   
IBM           148.427471  143.173996  90.439774  205.908218   23.998357   
INTC           36.481406   34.270000  17.879999   68.199997   12.912534   
META          147.874275  141.620002  18.080000  381.679993   86.734656   
MSFT          100.048490   55.660000  23.090000  344.619995   88.228008   
NFLX          188.242206  110.010002   6.960000  692.349976  178.967463   
NVDA           50.560335   11.902500   2.180000  335.170013   69.538684   
ORCL           46.242177   41.750000  21.459999  104.290001   16.864193   
TSLA           58.859467   16.229000   1.076000  411.470001   95.677282   

                    volume                                                   
                      mean       median       min         max           std  
stock_symbol                                                                 
AAPL          2.563255e+08  166674000.0  35195900  1880998000  2.225768e+08  
ADBE          3.814337e+06    2948500.0    589200   108752400  3.598144e+06  
AMZN          8.833999e+07   74592000.0  17626000   848422000  5.309249e+07  
CRM           6.910973e+06    5548800.0   1084700    64562800  5.048860e+06  
CSCO          3.269656e+07   25482400.0   5720500   560040200  2.570963e+07  
GOOGL         6.018647e+07   41234000.0   9312000   592399008  4.957963e+07  
IBM           5.036545e+06    4345189.0   1247878    39814421  2.772073e+06  
INTC          3.607170e+07   29874600.0   5893800   199002600  2.123178e+07  
META          3.117815e+07   23239000.0   5913100   573576400  2.713267e+07  
MSFT          3.801647e+07   32280800.0   7425600   319317900  2.147328e+07  
NFLX          1.841485e+07   11961800.0   1144000   315541800  2.054316e+07  
NVDA          5.080613e+07   43395600.0   4564400   369292800  3.210953e+07  
ORCL          1.801856e+07   14699800.0   2754900   183503900  1.251053e+07  
TSLA          9.351647e+07   75914250.0   1777500   914082000  8.164780e+07  

Data Wrangling

# Step 2: Data Wrangling
# Handle missing values(No missing values)

Data Visualization

# Initialize an empty list to store candlestick traces for all companies
candlestick_traces = []

# Loop through each company and create a candlestick trace
for company in stocks['stock_symbol'].unique():
    df_company = stocks[stocks['stock_symbol'] == company]
    candlestick_trace = go.Candlestick(x=df_company['date'],
                                       open=df_company['open'],
                                       high=df_company['high'],
                                       low=df_company['low'],
                                       close=df_company['close'],
                                       name=f"{company} Candlestick")
    candlestick_traces.append(candlestick_trace)

# Create the figure
fig = go.Figure(data=candlestick_traces)

# Customize the layout
fig.update_layout(title='Candlestick Chart for 14 Companies',
                  xaxis_title='Date',
                  yaxis_title='Price',
                  xaxis_rangeslider_visible=False,
                  height=600,
                  width=1000)

# Show the figure
fig.show()

Closing Prices Over Time for Different Companies

stocks = companies['stock_symbol'].values.tolist()
# Creating a DataFrame to hold all the stock data
# For simplicity, let's generate some random closing prices
all_stock_data = df

# Create a Plotly figure
fig = go.Figure()

# Add a line for each stock
for stock in stocks:
    stock_data = df[df['stock_symbol'] == stock]
    fig.add_trace(go.Scatter(x=stock_data['date'], y=stock_data['close'], mode='lines', name=stock))

# Customize the layout
fig.update_layout(
    title='Stock Close Prices Over Time',
    xaxis_title='Date',
    yaxis_title='Close Price',
    height=400
)

# Show the figure
fig.show()

Volume Analysis using Bar Plot

# Calculate average volume traded for each stock symbol
avg_volume = df.groupby('stock_symbol')['volume'].mean().sort_values()

# Create a bar trace
bar_trace = go.Bar(
    x=avg_volume.index,  # Stock symbols
    y=avg_volume.values,  # Average volume
    marker_color='skyblue'
)

# Create a layout
layout = go.Layout(
    title='Average Volume Traded for Each Stock Symbol',
    xaxis=dict(title='Stock Symbol'),
    yaxis=dict(title='Average Volume (Shares Traded)'),
    xaxis_tickangle=-45,
    margin=dict(l=40, r=40, t=60, b=20)
)

# Create a figure
fig = go.Figure(data=[bar_trace], layout=layout)

# Show the figure
fig.show()

Q2 Backwards verification: if we invested x amount of dollars in 2010, how much would it be worth in 2022, when would be a good/bad time to pull investment out of the market.

EDA

# Load the CSV files
stocks = pd.read_csv('data/big_tech_stock_prices.csv')
companies = pd.read_csv('data/big_tech_companies.csv')
# Copy the DataFrames to avoid modifying the original data
df_stocks = stocks.copy()
df_companies = companies.copy()
# Now you can proceed with your groupby aggregation
investment_stats = df_stocks.groupby('stock_symbol').agg({
    'open': ['mean', 'median', 'min', 'max', 'std'],
    'high': ['mean', 'median', 'min', 'max', 'std'],
    'low': ['mean', 'median', 'min', 'max', 'std'],
    'close': ['mean', 'median', 'min', 'max', 'std'],
    'adj_close': ['mean', 'median', 'min', 'max', 'std'],
    'volume': ['mean', 'median', 'min', 'max', 'std']
})
print("Summary Statistics:")
print(investment_stats)
Summary Statistics:
                    open                                                 \
                    mean      median        min         max         std   
stock_symbol                                                              
AAPL           51.274171   29.745001   6.870357  182.630005   47.316509   
ADBE          186.023938   97.589996  22.969999  696.280029  173.562115   
AMZN           58.937204   36.325001   5.296500  187.199997   54.138607   
CRM           103.419948   76.290001  15.522500  310.000000   71.287894   
CSCO           33.493880   29.500000  13.930000   64.040001   12.643887   
GOOGL          49.149625   38.521000  10.968719  151.250000   35.809816   
IBM           148.427471  143.173996  90.439774  205.908218   23.998357   
INTC           36.481406   34.270000  17.879999   68.199997   12.912534   
META          147.874275  141.620002  18.080000  381.679993   86.734656   
MSFT          100.048490   55.660000  23.090000  344.619995   88.228008   
NFLX          188.242206  110.010002   6.960000  692.349976  178.967463   
NVDA           50.560335   11.902500   2.180000  335.170013   69.538684   
ORCL           46.242177   41.750000  21.459999  104.290001   16.864193   
TSLA           58.859467   16.229000   1.076000  411.470001   95.677282   

                    high                                                 ...  \
                    mean      median        min         max         std  ...   
stock_symbol                                                             ...   
AAPL           51.845876   29.980000   7.000000  182.940002   47.926721  ...   
ADBE          188.208985   98.239998  23.360001  699.539978  175.674664  ...   
AMZN           59.610805   36.500000   5.564500  188.654007   54.785319  ...   
CRM           104.720219   77.160004  15.625000  311.750000   72.149412  ...   
CSCO           33.804014   29.770000  14.120000   64.290001   12.760317  ...   
GOOGL          49.638485   38.930000  11.068068  151.546494   36.215672  ...   
IBM           149.571376  144.160004  93.441681  206.405350   24.013109  ...   
INTC           36.875197   34.580002  17.920000   69.290001   13.090012  ...   
META          149.757980  143.415000  18.270000  384.329987   87.818979  ...   
MSFT          101.039621   56.000000  23.320000  349.670013   89.181105  ...   
NFLX          191.178020  111.900002   7.178571  700.989990  181.488846  ...   
NVDA           51.504697   11.982500   2.262500  346.470001   70.979752  ...   
ORCL           46.696918   42.000000  21.680000  106.339996   17.059092  ...   
TSLA           60.174863   16.491000   1.108667  414.496674   97.873400  ...   

               adj_close                                                 \
                    mean      median        min         max         std   
stock_symbol                                                              
AAPL           49.445122   27.385101   5.846675  180.959732   47.810585   
ADBE          186.022299   97.720001  22.690001  688.369995  173.466083   
AMZN           58.905287   36.382500   5.430500  186.570496   54.085034   
CRM           103.400510   76.260002  15.520000  309.959991   71.213839   
CSCO           28.624463   23.570276   9.743538   61.521923   13.590111   
GOOGL          49.148954   38.538502  10.912663  149.838501   35.803048   
IBM           113.148579  113.695961  75.138626  150.570007   14.336266   
INTC           31.310983   28.492294  12.135988   64.383247   13.745841   
META          147.913244  142.065002  17.730000  382.179993   86.763257   
MSFT           95.285446   50.052330  17.769510  339.924835   89.253596   
NFLX          188.252178  110.099998   7.018571  691.690002  178.877130   
NVDA           50.282939   11.685297   2.037410  333.407379   69.495321   
ORCL           42.572566   37.007454  17.991089  101.501656   17.867223   
TSLA           58.805222   16.222334   1.053333  409.970001   95.544413   

                    volume                                                   
                      mean       median       min         max           std  
stock_symbol                                                                 
AAPL          2.563255e+08  166674000.0  35195900  1880998000  2.225768e+08  
ADBE          3.814337e+06    2948500.0    589200   108752400  3.598144e+06  
AMZN          8.833999e+07   74592000.0  17626000   848422000  5.309249e+07  
CRM           6.910973e+06    5548800.0   1084700    64562800  5.048860e+06  
CSCO          3.269656e+07   25482400.0   5720500   560040200  2.570963e+07  
GOOGL         6.018647e+07   41234000.0   9312000   592399008  4.957963e+07  
IBM           5.036545e+06    4345189.0   1247878    39814421  2.772073e+06  
INTC          3.607170e+07   29874600.0   5893800   199002600  2.123178e+07  
META          3.117815e+07   23239000.0   5913100   573576400  2.713267e+07  
MSFT          3.801647e+07   32280800.0   7425600   319317900  2.147328e+07  
NFLX          1.841485e+07   11961800.0   1144000   315541800  2.054316e+07  
NVDA          5.080613e+07   43395600.0   4564400   369292800  3.210953e+07  
ORCL          1.801856e+07   14699800.0   2754900   183503900  1.251053e+07  
TSLA          9.351647e+07   75914250.0   1777500   914082000  8.164780e+07  

[14 rows x 30 columns]

Data Wrangling

# Handle missing values(No missing values)
print("Null Values from Stocks dataset", stocks.isnull().sum())
print("Null Values from companies dataset", companies.isnull().sum())

# Outlier Detection and Treatment
# Calculate the Z-score for the 'adj_close' column
df_stocks['z_score'] = stats.zscore(df_stocks['adj_close'])
# Remove outliers beyond 3 standard deviations
df_stocks = df_stocks[df_stocks['z_score'].abs() <= 3]
Null Values from Stocks dataset stock_symbol    0
date            0
open            0
high            0
low             0
close           0
adj_close       0
volume          0
dtype: int64
Null Values from companies dataset stock_symbol    0
company         0
dtype: int64

Data Visualization

stocks = companies['stock_symbol'].values.tolist()

# Creating a DataFrame to hold all the stock data
all_stock_data = df_stocks

# Create a Plotly figure
fig = go.Figure()

# Add a line for each stock
for stock in stocks:
    stock_data = df_stocks[df_stocks['stock_symbol'] == stock]
    fig.add_trace(go.Scatter(x=stock_data['date'], y=stock_data['close'], mode='lines', name=stock))

# Customize the layout
fig.update_layout(
    title='Stock Close Prices Over Time',
    xaxis_title='Date',
    yaxis_title='Close Price',
    height=800
)

# Show the figure
fig.show()
start_date = "2010-01-01"
filtered_data = all_stock_data[all_stock_data['date'] >= start_date]

# Normalize each stock's prices to start from a base (e.g., 100)
base = 10
for stock in stocks:
    initial_price = filtered_data[filtered_data['stock_symbol'] == stock].iloc[0]['close']
    filtered_data.loc[filtered_data['stock_symbol'] == stock, 'Normalized Close'] = base * (filtered_data['close'] / initial_price)

# Create a Plotly figure
fig = go.Figure()

# Add a line for each stock's normalized close prices
for stock in stocks:
    stock_data = filtered_data[filtered_data['stock_symbol'] == stock]
    fig.add_trace(go.Scatter(x=stock_data['date'], y=stock_data['Normalized Close'], mode='lines', name=stock))

# Customize the layout
fig.update_layout(
    title='Normalized Stock Close Prices Over Time',
    xaxis_title='Date',
    yaxis_title='Normalized Close Price',
    height=600
)

# Show the figure
fig.show()

To answer the question “when would be a good/bad time to pull investment out of the market?” we rely on a common trading strategy based on moving averages:

The 50-day moving average (50DMA) is often used as a short-term trend indicator. It’s more sensitive to price changes in the recent past.

The 200-day moving average (200DMA) is commonly used as a long-term trend indicator. It’s less sensitive to daily price fluctuations and more indicative of a longer-term trend.

When the 50DMA crosses above the 200DMA, it’s often referred to as a “golden cross,” which some traders interpret as a bullish signal suggesting that the price might increase, potentially a good time to buy.

Conversely, when the 50DMA crosses below the 200DMA, it’s known as a “death cross,” which is considered by some traders to be a bearish signal, indicating that the price might drop, and it could be a good time to sell.

df_AAPL = df_stocks[df_stocks['stock_symbol'] == 'AAPL']

start_date = "2013-01-01"
df_AAPL = df_AAPL[df_AAPL['date'] >= start_date]

# Calculate 200DMA and 50DMA
df_AAPL['200DMA'] = df_AAPL['close'].rolling(window=200).mean()
df_AAPL['50DMA'] = df_AAPL['close'].rolling(window=50).mean()

# Creating the candlestick chart
fig = go.Figure()

# Add Candlestick trace
fig.add_trace(go.Candlestick(x=df_AAPL['date'],
                open=df_AAPL['open'], high=df_AAPL['high'],
                low=df_AAPL['low'], close=df_AAPL['close'],
                name='Candlestick'))

# Add 200DMA Line trace
fig.add_trace(go.Scatter(x=df_AAPL['date'], y=df_AAPL['200DMA'],
                         mode='lines', name='200DMA',
                         line=dict(color='blue', width=2)))

# Add 50DMA Line trace
fig.add_trace(go.Scatter(x=df_AAPL['date'], y=df_AAPL['50DMA'],
                         mode='lines', name='50DMA',
                         line=dict(color='red', width=2)))

# Customize the layout
fig.update_layout(title='Stock Price with 200DMA and 50DMA',
                  xaxis_title='Date',
                  yaxis_title='Price',
                  xaxis_rangeslider_visible=False,  # Hides the range slider
                  height=600)

# Show the figure
fig.show()